skip to main content


Search for: All records

Creators/Authors contains: "Hua, X."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The automation of extracting argument structures faces a pair of challenges on (1) encoding long-term contexts to facilitate comprehensive understanding, and (2) improving data efficiency since constructing high-quality argument structures is time-consuming. In this work, we propose a novel context-aware Transformer-based argument structure prediction model which, on five different domains, significantly outperforms models that rely on features or only encode limited contexts. To tackle the difficulty of data annotation, we examine two complementary methods: (i) transfer learning to leverage existing annotated data to boost model performance in a new target domain, and (ii) active learning to strategically identify a small amount of samples for annotation. We further propose model-independent sample acquisition strategies, which can be generalized to diverse domains. With extensive experiments, we show that our simple-yet-effective acquisition strategies yield competitive results against three strong comparisons. Combined with transfer learning, substantial F1 score boost (5-25) can be further achieved during the early iterations of active learning across domains. 
    more » « less
  2. Building effective text generation systems requires three critical components: content selection, text planning, and surface realization, and traditionally they are tackled as separate problems. Recent all-in-one style neural generation models have made impressive progress, yet they often produce outputs that are incoherent and unfaithful to the input. To address these issues, we present an end-to-end trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text. For experiments, we consider three tasks from domains with diverse topics and varying language styles: persuasive argument construction from Reddit, paragraph generation for normal and simple versions of Wikipedia, and abstract generation for scientific articles. Automatic evaluation shows that our system can significantly outperform competitive comparisons. Human judges further rate our system generated text as more fluent and correct, compared to the generations by its variants that do not consider language style. 
    more » « less
  3. Automatic argument generation is an appealing but challenging task. In this paper, we study the specific problem of counter-argument generation, and present a novel framework, CANDELA. It consists of a powerful retrieval system and a novel two-step generation model, where a text planning decoder first decides on the main talking points and a proper language style for each sentence, then a content realization decoder reflects the decisions and constructs an informative paragraph-level argument. Furthermore, our generation model is empowered by a retrieval system indexed with 12 million articles collected from Wikipedia and popular English news media, which provides access to high-quality content with diversity. Automatic evaluation on a large-scale dataset collected from Reddit shows that our model yields significantly higher BLEU, ROUGE, and METEOR scores than the state-of-the-art and non-trivial comparisons. Human evaluation further indicates that our system arguments are more appropriate for refutation and richer in content. 
    more » « less
  4. Peer-review plays a critical role in the scientific writing and publication ecosystem. To assess the efficiency and efficacy of the reviewing process, one essential element is to understand and evaluate the reviews themselves. In this work, we study the content and structure of peer reviews under the argument mining framework, through automatically detecting (1) the argumentative propositions put forward by reviewers, and (2) their types (e.g., evaluating the work or making suggestions for improvement). We first collect 14.2K reviews from major machine learning and natural language processing venues. 400 reviews are annotated with 10,386 propositions and corresponding types of Evaluation, Request, Fact, Reference, or Quote. We then train state-of-the-art proposition segmentation and classification models on the data to evaluate their utilities and identify new challenges for this new domain, motivating future directions for argument mining. Further experiments show that proposition usage varies across venues in amount, type, and topic. 
    more » « less
  5. High quality arguments are essential elements for human reasoning and decision-making processes. However, effective argument construction is a challenging task for both human and machines. In this work, we study a novel task on automatically generating arguments of a different stance for a given statement. We propose an encoder-decoder style neural network-based argument generation model enriched with externally retrieved evidence from Wikipedia. Our model first generates a set of talking point phrases as intermediate representation, followed by a separate decoder producing the final argument based on both input and the keyphrases. Experiments on a large-scale dataset collected from Reddit show that our model constructs arguments with more topic-relevant content than popular sequence-to-sequence generation models according to automatic evaluation and human assessments. 
    more » « less
  6. We investigate the problem of sentence-level supporting argument detection from relevant documents for user-specified claims. A dataset containing claims and associated citation articles is collected from online debate website idebate.org. We then manually label sentence-level supporting arguments from the documents along with their types as study, factual, opinion, or reasoning. We further characterize arguments of different types, and explore whether leveraging type information can facilitate the supporting arguments detection task. Experimental results show that LambdaMART (Burges, 2010) ranker that uses features informed by argument types yields better performance than the same ranker trained without type information. 
    more » « less
  7.  
    more » « less
  8. Abstract

    Despite extensive studies on size effects in ferroelectrics, how structures and properties evolve in antiferroelectrics with reduced dimensions still remains elusive. Given the enormous potential of utilizing antiferroelectrics for high‐energy‐density storage applications, understanding their size effects will provide key information for optimizing device performances at small scales. Here, the fundamental intrinsic size dependence of antiferroelectricity in lead‐free NaNbO3membranes is investigated. Via a wide range of experimental and theoretical approaches, an intriguing antiferroelectric‐to‐ferroelectric transition upon reducing membrane thickness is probed. This size effect leads to a ferroelectric single‐phase below 40 nm, as well as a mixed‐phase state with ferroelectric and antiferroelectric orders coexisting above this critical thickness. Furthermore, it is shown that the antiferroelectric and ferroelectric orders are electrically switchable. First‐principle calculations further reveal that the observed transition is driven by the structural distortion arising from the membrane surface. This work provides direct experimental evidence for intrinsic size‐driven scaling in antiferroelectrics and demonstrates enormous potential of utilizing size effects to drive emergent properties in environmentally benign lead‐free oxides with the membrane platform.

     
    more » « less
  9. Abstract

    The world is facing a crisis of language loss that rivals, or exceeds, the rate of loss of biodiversity. There is an increasing urgency to understand the drivers of language change in order to try and stem the catastrophic rate of language loss globally and to improve language vitality. Here we present a unique case study of language shift in an endangered Indigenous language, with a dataset of unprecedented scale. We employ a novel multidimensional analysis, which allows the strength of a quantitative approach without sacrificing the detail of individual speakers and specific language variables, to identify social, cultural, and demographic factors that influence language shift in this community. We develop the concept of the ‘linguatype’, a sample of an individual’s language variants, analogous to the geneticists’ concept of ‘genotype’ as a sample of an individual’s genetic variants. We use multidimensional clustering to show that while family and household have significant effects on language patterns, peer group is the most significant factor for predicting language variation. Generalized linear models demonstrate that the strongest factor promoting individual use of the Indigenous language is living with members of the older generation who speak the heritage language fluently. Wright–Fisher analysis indicates that production of heritage language is lost at a significantly faster rate than perception, but there is no significant difference in rate of loss of verbs vs nouns, or lexicon vs grammar. Notably, we show that formal education has a negative relationship with Indigenous language retention in this community, with decreased use of the Indigenous language significantly associated with more years of monolingual schooling in English. These results suggest practical strategies for strengthening Indigenous language retention and demonstrate a new analytical approach to identifying risk factors for language loss in Indigenous communities that may be applicable to many languages globally.

     
    more » « less